1 Phylogentic Tree Comparison

In our module we want to break down phylogentic tress and understand how R coding can help us compare them.

1.1 Packages

For starters, you will need to install {ape}, {phytools}

1.2 Objectives

Objectives

1.3 Introduction

Evolutionary biologists study the ways organisms change over time. This is often done by comparing several species to one another and analyzing the differences between them. These differences can then be used to build a graphical representation of species’ relationships to one another, called a phylogenetic tree. Originally, these trees were constructed based solely on organism morphology using shared derived traits, or synapomorphies, often in the form of character matrices. By focusing on these physical traits, biologists could hypothesize different life histories, like species divergence or common descent. However, using morphological data to determine taxonomic relationships is not foolproof. Some shared, analogous traits may be due to convergent evolution. Homoplasy can be difficult to detect through morphological analysis alone, and has historically resulted in phylogenetic trees that have since been disproved through the use of molecular data.

Modern biologists can now build phylogenetic trees based on the DNA of the organisms in question. By focusing on a few specific genes, evolutionary relationships can be hypothesized with greater accuracy. Examining the polymorphism at different loci can provide information on how distantly related organisms are to one another, and can reduce errors commonly associated with comparing or identifying cryptic species. Molecular data can help resolve unclear phylogenies that were created before genomic methods were available. Biologists can now revisit unresolved trees built from morphological data and revise them using DNA, allowing us to better hypothesize evolutionary relationships.

In order to assess any changes between these hypothesized evolutionary relationships, we can compare the topologies of phylogenetic trees. The topology refers to the branching pattern displayed, which represents the measure of relatedness among taxa. To understand the importance of topology, we must familiarize ourselves with the anatomy of a phylogenetic tree.

Anatomy of a tree:

Why would we compare tree topologies?

1.4 Methods

There are several different ways to estimate phylogenies, each with their own strengths, weaknesses, and appropriate situations in which to apply them. While we won’t go over them in detail, familiarize yourselves by looking at Module 24
There are MANY different phylogenetics software that we can use to employ these methods of estimation, but some of the most common are PAUP, BEAST, MrBayes, and PHYLIP. These can take genetic data (alignments) in the form of fastq files, fasta files, or NEXUS files.

Types of Tree Formats: Newick, Nexus

There are several different formats that a graphical tree can be built from.
Newick: A collection of data formatted using specific syntax that includes parentheses, commas, and semicolons to delineate weight, time, and evolutionary distance. “Newick files are simply text files that consist of one or more tree descriptions in the Newick notation. In contrast to Nexus files they contain no further syntax elements or other information than the trees.”

1.4.1 Comparing Trees

We will be comparing two graphical hypotheses of the genus adelpha, a group of butterflies. image

One tree is morphologically based, from Keith Wilmott’s 2003 paper “Cladistic analysis of the Neotropical butterfly genus Adelpha (Lepidoptera: Nymphalidae), with comments on the subtribal classification of Limenitidini.” Our molecular tree is from Emily Ebel’s 2015 paper “Rapid diversification associated with ecological specialization in Neotropical Adelpha butterflies.”

1.4.2 Example

To start off, we want to understand how a tree is created. For this example we will be making our own tree using Newick format.

Here is a great explanation of Newick format: “Put simply, monophyletic clades are surrounded by parentheses and sister clades are separated by commas. For example, a simple tree could be written as (((A,B),C),(D,E)).” Let’s try making that!

You can check that you have the most up-to-date version of R by running the command “R.Version”

R.version
##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          1.1                         
## year           2021                        
## month          08                          
## day            10                          
## svn rev        80725                       
## language       R                           
## version.string R version 4.1.1 (2021-08-10)
## nickname       Kick Things

Once you know you have the correct version of R, install and load the following packages

library(ape)
library(fastmatch)
library(quadprog)
library(phangorn)
library(phytools)
library(geiger)

You can make sure you have the most up-to-date version of each package by using the command “packageVersion(”package-name")

packageVersion("ape")
## [1] '5.5'

Once we have all the necessary packages loaded into our markdown file, we can start playing around with building some phylogenetic trees! If you already know the relationships between the groups of species that you want to plot, and these the clade you are plotting isn’t too complex, you can simply write out the tree as a text string in Newick format! Let’s try it first with letters

text.string<-
    "(((A,B),C),(D,E));"
example.tree<-read.tree(text=text.string)#this command reads trees in Newick format like we did above
plot(example.tree,no.margin=TRUE,edge.width=2) 

Looks good! Now we can try it plotting a clade of whale species

text.string<-
    "((((humpback wahle, fin whale), (Antarctic minke whale, common minke whale)), bowhead whale), sperm whale);"
whale.tree<-read.tree(text=text.string)
plot(whale.tree,no.margin=TRUE,edge.width=2)

There are many different commands that will allow you to visualize your tree in many different ways. Let’s try a few!

roundPhylogram(whale.tree) #creates rounded branches in tree 

plot(unroot(whale.tree),type="unrooted",no.margin=TRUE,lab4ut="axial",
    edge.width=2) #creates an unrooted tree 

plotTree(whale.tree,type="fan",fsize=0.7,lwd=1,
    ftype="i") #creates a fan tree 

1.4.3 Method

  1. Load the tree files into R

  • Morphological data is in a NEXUS (.nex) file, and molecular data is in a .tre file in Newick format.
  1. Adjust and manipulate tree using ggtree package to make graphics more readable

  2. Comparing Trees

  • Check to see if the trees are equal using all.equal(morphtree, moleculartreel) phytools package

1.5 Loading the Tree Files

1.5.1 Molecular Tree

Now that we are a little more comfortable working with phylogenetic trees in R, we can load our first tree!

First, Use this link (“https://github.com/sinnabunbun/Super-Fly-Group-Module”) to go to the Super Fly Group Module repo. From there, click on the “NJst.tre” file. Copy this data using the little pencil icon. Then go to your own working repo, select the “Create New File” option and paste the tree data into that file.

Once you have the tree file saved in your repo, you can load the tree into your R markdown file using the “read.nexus” command

mol.tree<-read.nexus(file="NJst.tre")
mol.tree
## 
## Phylogenetic tree with 66 tips and 64 internal nodes.
## 
## Tip labels:
##   A_rothschildi, A_sichaeus, A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, ...
## 
## Unrooted; includes branch lengths.

Once the tree is loaded, we can try plotting it!

plotTree(mol.tree,ftype="i",fsize=0.6,lwd=2, no.margin = TRUE)

the Ntip() function will tell you how many different species (or tips) are represented in your tree

Ntip(mol.tree) ##66 species in this tree
## [1] 66

Just like with the whale tree above, we can use “unroot” to create an unrooted tree

plot(unroot(mol.tree),type="unrooted",cex=0.6,
    use.edge.length=FALSE,lab4ut="axial",
    no.margin=TRUE)

##unrooted tree 

we can also make a fan tree

plotTree(mol.tree,type="fan",fsize=0.7,lwd=2,
    ftype="i") 

If you want to see all the species names, use the code mol.tree$tip.label

##all the species names 
mol.tree$tip.label
##  [1] "A_rothschildi"             "A_sichaeus"               
##  [3] "A_boreas_boreas"           "A_saundersii_saundersii"  
##  [5] "A_attica_attica"           "A_leuceroides"            
##  [7] "A_zina_irma"               "A_zina_zina"              
##  [9] "A_jordani"                 "A_justina_valentina"      
## [11] "A_olynthia"                "A_boeotia_boeotia"        
## [13] "A_malea_aethalia"          "A_delinita"               
## [15] "A_heraclea"                "A_naxia"                  
## [17] "A_capucinus_capucinus"     "A_mesentina"              
## [19] "A_phylaca_pseudaethalia"   "A_lycorias_lara"          
## [21] "A_lycorias_spruceana"      "A_erotia_erotia"          
## [23] "A_pollina"                 "A_irmina_tumida"          
## [25] "A_leucophthalma_irminella" "A_cocala_cocala"          
## [27] "L_lorquini"                "L_weidemeyerii"           
## [29] "L_arthemis_arizonensis"    "L_arthemis_arthemis"      
## [31] "L_archippus_floridanensis" "L_populi"                 
## [33] "L_sydyi"                   "L_amphyssa"               
## [35] "L_moltrechti"              "L_doerriesi"              
## [37] "L_helmanni"                "L_camilla"                
## [39] "L_homeyeri"                "L_glorifica"              
## [41] "L_reducta"                 "A_donysa_donysa"          
## [43] "A_pithys"                  "A_alala_negra"            
## [45] "A_tracta"                  "A_corcyra_aretina"        
## [47] "Athyma_selenophora"        "Pandita_sinope"           
## [49] "Sumalia_daraxa"            "Parasarpa_zayla"          
## [51] "Moduza_urdaneta"           "A_seriphia_aquillia"      
## [53] "A_seriphia_therasia"       "A_serpa_celerio"          
## [55] "A_melona_leucocoma"        "A_cytherea_cytherea"      
## [57] "A_cytherea_daguana"        "A_salmoneus_colada"       
## [59] "A_iphicleola_thessalita"   "A_iphiclus_iphiclus"      
## [61] "A_thessalia_thessalia"     "A_epione_agilla"          
## [63] "A_ethelda_ethelda"         "A_basiloides"             
## [65] "A_plesaure_phliassa"       "A_shuara"

You can also add arrows to draw attention to specific species!

##add an arrow on a specific branch tip 
plotTree(mol.tree,type="fan",fsize=0.7,lwd=1,
    ftype="i")
add.arrow(mol.tree,tip="A_saundersii_saundersii",arrl=1)

1.5.2 Morphological Tree

In order to compare molecular and morphological trees of the Adelpha lineage, we needed to build a tree in R from a phylogeny constructed based on morphological data, like the one reported in Keith Wilmott’s paper. However, because this tree was created in 2003 and is therefore not in a format that is easily compatible with current R software, we ended up transcribing the tree into R in Newick format based on the phylogenetic relationships presented in Wilmott’s Figure 8.

string <- "((A. alala negra, A. corcyra aretina, A. tracta, A. pithys, A. donysa donysa), ((A. serpacelerio, A. seriphiaaquillia, A. seriphiatherasia), (A. melonaleucocoma, A. salmoneus colada, A. cythereacytherea, A. cythereadaguana, A. epioneagilla, A. etheldaethelda, A. thessaliathessalia, A. iphicleolathessalita, A. iphiclusiphiclus, A. shuara, A. plesaurephliassa, A. basiloides, A. atticaattica, A. leucerioides, A. saundersiisaundersii, A. boreasboreas, A. rothschildi, A. sichaeus, A. cocalacocala, A. leucophthalmairminella, A. irminatumida, A. pollina, A. lycoriaslara, A. lycoriasspruceana, A. erotia erotia, A. mesentina, A. phylaca pseudaethalia, A. capucinus capucinus, A. heraclea heraclea, A. naxia naxia, A. justina valentina, A. olynthia, A. jordani, A. zinazina, A. zinairma, A. delinita delinita, A. boeotia boeotia, A. maleaaethalia)));"

morph.tree<-read.tree(text=string)
plot(morph.tree,no.margin=TRUE,edge.width=2)

Just as we did with the molecular tree above, we can also plot this tree is a variety of ways

morph.tree2 <- plot(morph.tree, "fan", main="Morphological Tree")

morph.tree2
## $type
## [1] "fan"
## 
## $use.edge.length
## [1] FALSE
## 
## $node.pos
## NULL
## 
## $node.depth
## [1] 1
## 
## $show.tip.label
## [1] TRUE
## 
## $show.node.label
## [1] FALSE
## 
## $font
##  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [39] 3 3 3 3 3 3 3 3
## 
## $cex
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1
## 
## $adj
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
## [39] 0 0 0 0 0 0 0 0
## 
## $srt
## [1] 0
## 
## $no.margin
## [1] FALSE
## 
## $label.offset
## [1] 0
## 
## $x.lim
## [1] -1.430993  1.430993
## 
## $y.lim
## [1] -1.428662  1.428662
## 
## $direction
## [1] "rightwards"
## 
## $tip.color
##  [1] "black" "black" "black" "black" "black" "black" "black" "black" "black"
## [10] "black" "black" "black" "black" "black" "black" "black" "black" "black"
## [19] "black" "black" "black" "black" "black" "black" "black" "black" "black"
## [28] "black" "black" "black" "black" "black" "black" "black" "black" "black"
## [37] "black" "black" "black" "black" "black" "black" "black" "black" "black"
## [46] "black"
## 
## $Ntip
## [1] 46
## 
## $Nnode
## [1] 5
## 
## $root.time
## NULL
## 
## $align.tip.label
## [1] FALSE

or an unrooted tree

morph.tree3 <- plot(morph.tree, "unrooted", main="Morphological Tree")

morph.tree3
## $type
## [1] "unrooted"
## 
## $use.edge.length
## [1] FALSE
## 
## $node.pos
## NULL
## 
## $node.depth
## [1] 1
## 
## $show.tip.label
## [1] TRUE
## 
## $show.node.label
## [1] FALSE
## 
## $font
## [1] 3
## 
## $cex
## [1] 1
## 
## $adj
## [1] 0
## 
## $srt
## [1] 0
## 
## $no.margin
## [1] FALSE
## 
## $label.offset
## [1] 0
## 
## $x.lim
## [1] -1.360677  6.094956
## 
## $y.lim
## [1] -1.360677  4.510391
## 
## $direction
## [1] "rightwards"
## 
## $tip.color
## [1] "black"
## 
## $Ntip
## [1] 46
## 
## $Nnode
## [1] 5
## 
## $root.time
## NULL
## 
## $align.tip.label
## [1] FALSE

1.6 GGTree

https://guangchuangyu.github.io/ggtree-book/chapter-ggtree.html#fig:viewClade code for installing ggtree package from BiocManager

Load ggplot2 and ggtree into your markdown file

library(ggplot2)
library(ggtree)
## ggtree v3.3.0.900  For help: https://yulab-smu.top/treedata-book/
## 
## If you use ggtree in published research, please cite the most appropriate paper(s):
## 
## 1. Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
## 2. Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(12):3041-3043. doi:10.1093/molbev/msy194
## 3. Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## 
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
## 
##     rotate
g <- ggtree(mol.tree, color="steelblue", size=0.5, linetype="dotted")
g <- g + geom_tiplab(size=2)
g

ggtree(mol.tree, layout="circular") + ggtitle("(Phylogram) circular layout")

multiplot(ggtree(mol.tree), ggtree(morph.tree), ncol=2, labels=c('Molecular Tree', 'Morphological Tree'))

1.7 Comparison

comparePhylo(mol.tree, morph.tree,  plot = FALSE, force.rooted = FALSE,
             use.edge.length = FALSE)
## => Comparing mol.tree with morph.tree.
## Trees have different numbers of tips: 66 and 46.
## Tips in mol.tree not in morph.tree : A_rothschildi, A_sichaeus, A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, A_zina_irma, A_zina_zina, A_jordani, A_justina_valentina, A_olynthia, A_boeotia_boeotia, A_malea_aethalia, A_delinita, A_heraclea, A_naxia, A_capucinus_capucinus, A_mesentina, A_phylaca_pseudaethalia, A_lycorias_lara, A_lycorias_spruceana, A_erotia_erotia, A_pollina, A_irmina_tumida, A_leucophthalma_irminella, A_cocala_cocala, L_lorquini, L_weidemeyerii, L_arthemis_arizonensis, L_arthemis_arthemis, L_archippus_floridanensis, L_populi, L_sydyi, L_amphyssa, L_moltrechti, L_doerriesi, L_helmanni, L_camilla, L_homeyeri, L_glorifica, L_reducta, A_donysa_donysa, A_pithys, A_alala_negra, A_tracta, A_corcyra_aretina, Athyma_selenophora, Pandita_sinope, Sumalia_daraxa, Parasarpa_zayla, Moduza_urdaneta, A_seriphia_aquillia, A_seriphia_therasia, A_serpa_celerio, A_melona_leucocoma, A_cytherea_cytherea, A_cytherea_daguana, A_salmoneus_colada, A_iphicleola_thessalita, A_iphiclus_iphiclus, A_thessalia_thessalia, A_epione_agilla, A_ethelda_ethelda, A_basiloides, A_plesaure_phliassa, A_shuara.
## Tips in morph.tree not in mol.tree : A.alalanegra, A.corcyraaretina, A.tracta, A.pithys, A.donysadonysa, A.serpacelerio, A.seriphiaaquillia, A.seriphiatherasia, A.melonaleucocoma, A.salmoneuscolada, A.cythereacytherea, A.cythereadaguana, A.epioneagilla, A.etheldaethelda, A.thessaliathessalia, A.iphicleolathessalita, A.iphiclusiphiclus, A.shuara, A.plesaurephliassa, A.basiloides, A.atticaattica, A.leucerioides, A.saundersiisaundersii, A.boreasboreas, A.rothschildi, A.sichaeus, A.cocalacocala, A.leucophthalmairminella, A.irminatumida, A.pollina, A.lycoriaslara, A.lycoriasspruceana, A.erotiaerotia, A.mesentina, A.phylacapseudaethalia, A.capucinuscapucinus, A.heracleaheraclea, A.naxianaxia, A.justinavalentina, A.olynthia, A.jordani, A.zinazina, A.zinairma, A.delinitadelinita, A.boeotiaboeotia, A.maleaaethalia.
## Trees have different numbers of nodes: 64 and 5.
## mol.tree is unrooted, morph.tree is rooted.
## Both trees are not ultrametric.
all.equal(mol.tree, morph.tree, use.edge.length = TRUE,
                   use.tip.label = TRUE)
## [1] FALSE